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Abstract 

The information that a pattern of firing in the output layer of a feedforward 
network of threshold-hnear neurons conveys about the network's inputs is 
considered. A rephca-symmetric solution is found to be stable for all but 
small amounts of noise. The region of instability depends on the contribution 
of the threshold and the sparseness: for distributed pattern distributions, 
the unstable region extends to higher noise variances than for very sparse 
distributions, for which it is almost nonexistant. 
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I. INTRODUCTION 



Advances in techniques for the formal analysis of neural networks offer insight into 
the behaviour of models of biological interest. Of particular interest are methods which 
allow the calculation of the information that can be conveyed by a given neural structure, 
as these offer both useful intuitions and the prospect of conducting pertinent experiments 
0]. The replica trick has been used to achieve this in the case of binary units and 
threshold-linear units by appealing to an assumption of replica symmetry. In the 

case of binary units with continuous inputs, the validity of the replica-symmetric ansatz is 
justified by the duality with the Gardner calculation of the storage capacity for continuous 
couplings p|,|5|,p!0[| . We now analyse the stability of the replica symmetric solution for mutual 
information in a network of threshold-linear units. 

The model describes a feedforward network of threshold-linear units with partially diluted 
connectivity. This is a simpler version of the calculation described in |^,^]. In the calculation 
considered here, there is only one mode of operation (which we might call "transmission"), 
as opposed to the division into storage and recall modes in that calculation. There are N 
cells in the input layer, and M (proportional to A^) in the output layer. The limit of interest 
is — i> oo. 

{r]i} are the firing rates of each cell i in the input layer. The probability density of finding 
a given firing pattern is taken to be: 

Pi{v^})I[dVi = I[Pviv^)dVi (1) 

i i 

Each input cell is thus assumed to code independent information. 

{^j} are the firing rates produced in each cell in the output layer. They are determined 
by the matrix multiplication of the pattern {rji} with the synaptic weights Jij, followed by 
Gaussian distortion, thresholding and rectification. 



Co + ^ CijJijVi + 



(2) 



{(e^r) = <yl (3) 

Each output cell receives Cj (which we will take to be of the order of 10^) connections from 
input layer cells: 

€{0,1} {c,,)N = C, {C = {C,)) (4) 

The mean value across all patterns of each synaptic weight is taken to be equal across 
synapses, and is therefore taken into the threshold term. The synaptic weights Jij are thus 
of zero mean, and variance Uj (all that affects the calculation is the first two moments of 
their distribution). 

{{J^^?) = (5) 
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The average of the mutual information 



over the quenched variables q-,-, Jij is written using the replica trick as 



(6) 



P(r/) 



[pior 



(7) 



C,J 



The calculation is valid only for non-zero noise variance, and it will be seen that the only 
region in which the solution is not well behaved is that of very low noise variance. 



II. CALCULATION OF MUTUAL INFORMATION 



First, introducing replica indices a = 1, ..,n + 1, and breaking the integral over ^ into 
subthreshold and suprathreshold components, we observe that 



lim — < 

n^O n 



+ J dr] 
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(8) 



This allows us to treat both terms of Eq. ^ in the same manner. To obtain the probability 
density ^P(?7",f")^, we use Dirac delta functions to implement the constraints defined by 

(I): 



p{v",c: 



cJ 
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xP(Kn) 
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(9) 



where 



Du 



du 



-v?l2 



(10) 



Using the integral form of the Dirac delta function introduces a Lagrange multiplier x". 
The integrals over the noise and interaction distributions are performed, and the quenched 
average over the connections performed in the thermodynamic limit, so that 
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2 ^ "J -J 



a\\n+l 



where 6ap is the Kronecker delta. A Lagrange multipher 



(11) 



(12) 



is introduced using the integral form of the Dirac delta function via an auxiliary variable 
\Ye then obtain 



P{V, I) 



n+l 



(a/3) 



ja/3 



Tr In M|(27r)-'^P({77f })"+^ 



(13) 



where M = afl + a]CZ and E = M"^. Z is the matrix with elements 2;"^, and (q;/3) is the 
pair aP, a ^ (5. 
Thus 



where 



2n/N 

-/(n^)(n^^)exp 



iiV E '2°^'' - NHAiz'^) - Mg{z'^, z") 



iiV E '^''5" + E 



(14) 



(z-)^ I d77P(77)exp -E^V 



(15) 



(5",z°^) = / (ncir?"P(?7"))exp (-E^"(^")' - E 5"V?7' 

■'^ a \ " (a/3) 



(16) 



-^(.", = e-i^'"^( r ^ exp -l(e - Co)^ E Ea, 



+ I (Il^Jexp 



00 Q v27r' 



-^Er-eo)^a/3(l^-eo) 

^ a/3 



(17) 
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III. REPLICA SYMMETRIC SOLUTION 



The assumption of replica symmetry can be written 



(18) 



The saddle-point method is utilized in the thermodynamic hmit, yielding the saddle-point 
equations 

(19a) 



ZoA = {V 



zqa = 



(19b) 



Zqb = 



Zqb = 



(19c) 
(19d) 



Zl 



Ds ^(r/^ + -^) exp(-— 77^ - sy/l^r])j In ^exp(-— 77^ - sy/z^r])J (19e) 



Zl — —ajCr< 
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{Pb + Qb)^^^ VPb + qs Pb \VPb + Qb , 



L 



1 + ln(/) 



'-io-U/OB^ 



/-Co-tyg^\ -3/2/^ , t{pB + qB) 
<y I = ]Pb Ko + - 



and the expression for the information per input cell 



(19f) 



(i) = r G{pA, Qa) + -^zizi - rG{pB, Qb) 



where 



J Ds (exp{~zir)'^ - sy/Ilr))'^ In ^exp(-izi?7^ - sy^r^)^ 
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(20) 



(21) 
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and 



(j){x) 



aix] 



driP{ri)x{ri) 
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PA = (T, 



OA 



Pb = (yl + a]C{zQB - zi) 



(22) 



We refer to r = M/N as the anatomical divergence. 

This expression must in general be evaluated numerically. However, considering some 
limiting cases can give us some insight into the behaviour of the solution. In particular, the 
limit of linear processing can be obtained by taking ,^0 +00. In this limit, Eq. |191| reduces 
to 



Pb 



(23) 



The information per neuron obtained in the linear limit is 



1 Pb . I ~ 
-rln h -ZiZi 

2 Pa 2 



J Ds (exp{-^zii]'^ - sy/YiT])^ In ^exp(-^2;i?7^ - sy/z^r])^ 



(24) 



The information obtained in this limit is bounded by that which would be obtained from 
a simple Gaussian channel calculation, where we consider the channel 



(25) 



and perform the annealed and quenched averages to obtain the signal variance cf'^jC {{rf) ^ — 
{t})'^), and information per input cell 



'■gauss 



'-In 
2 



1 + 



(26) 



The Gaussian channel information provides an upper limit corresponding to the optimal r] 
distribution (for transmitting maximal information given a constraint on the signal power), 
and no dependence upon the same inputs of the output cells. 

Within the linear limit, we can consider the special case of high noise variance (low signal 
to noise ratio). As cr^ — > 00, 
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and 



h - (27) 



z, - {r]f + 0{h). (28) 
The information therefore falls to zero as 

(.)^«|^. (29) 

i.e. inversely with noise variance, as one would expect. We thus can see that for linear 
neurons with low signal to noise ratio, the transmitted information approaches the Gaussian 
channel limit. Q 

The numerical solution of the mutual information expression, as a function of the noise 
variance, is shown in Fig. |1|, both for the case of linear units and for units with a threshold of 
^0 = —0.4, representing threshold-linear behaviour. This is shown for a binary pattern dis- 
tribution of sparseness a, where the sparseness of a distribution is a mean-invariant measure 
of spread and is defined in general as 

(30) 



This measure is 'more sparse' for smaller a, and reduces to the fraction of units 'on' in the 
case of a binary distribution. The Gaussian channel bound appears on the same graphs for 
comparison. 

The mutual information should be bounded by the pattern entropy as the noise vari- 
ance becomes very small. As the noise variance decreases, the replica-symmetric solution 
approaches this bound in both the linear and threshold-linear cases. It can be seen, how- 
ever, that for very small noise variances, the replica-symmetric solution changes direction 



and crosses this physical boundary. Inspection of Eq. |2T| reveals divergence of the mutual 
information solution in the limit ^ 0; this is in keeping with our intuition from the 
beginning that the calculation should not be valid in the deterministic limit. However, for 
such low noise variance the information has essentially saturated in any case. For threshold- 
linear neurons, the solution is also unstable to replica-symmetry-breaking fluctuations for 
relatively low noise variance, as will be discussed in the next section. 



IV. STABILITY OF THE REPLICA-SYMMETRIC SOLUTION 

The stability of the replica-symmetric solution is analysed after the style of de Almeida- 
Thouless . For the solution for free energy this was addressed in the context of Hopfield- 
Little type autoassociative neural networks in , and for an autoassociator with threshold- 
linear units and for a threshold-linear variant of the Sherrington-Kirkpatrick model in [IT2 



^It can also be shown (we have done so for the case of a Gaussian rj distribution), that as r — > 0, 
the Gaussian channel bound is also reached. 
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For the solution for another quantity, the Gardner volume, this was addressed in for Ising 
(±1) neurons. In contrast, here we are determining the stability of the solution for mutual 
information in a network comprised of threshold-linear neurons, although the technique 
proceeds very similarly. 

Fluctuations in the transverse (replica-symmetry breaking, RSB) and longitudinal 
(replica-symmetric, RS) directions are decoupled, and hence can be analysed separately. 
Longitudinal fluctuations can be disregarded [0,0 if a unique saddle-point is obtained, 
which appears to be the case. We will therefore concentrate upon transverse fluctuations. 

We wish to consider small deviations in the saddle-point parameters about the replica- 
symmetric saddle-point. 



= Zi+ Sz""^ 
= Zi+ 55"^ 



(31) 



Quadratic fluctuations in the function 

a (a/3) 



NHBiz", r^) - MGiz", z"^). (32) 



give us the stability matrix 

dz'^^dz^' 

d{ir^)dz^' 



dz'^^d{ii^') 
d{i~z''^)d{i~z'"' 



(33) 



where 5(a/3),(75) = ^a-^^ps + ^aS^p-y- In constrast to previous calculations based on quantities 
such as free energy, the expression for mutual information involves n + 1 replicas. There are 
n{n + l)/2 independent variables z"^, and the same number of independent z"^. F is thus 
an n{n + 1) x n{n + 1) matrix. 

The transverse eigenvalues of this matrix are given by the eigenvalues of the matrix 



1 



1 

Ab 



(34) 



where A^ and A^ are the transverse eigenvalues of the submatrices A^°'^^^'^^'> and 
respectively. Calculation of these involves consideration of the symmetry properties of the 
submatrices, and is detailed in the Appendix. The eigenvalue equations reduce to 



\a+c = A 
1 +c\b = cA 



(35) 



We thus have the two replicon mode eigenvalues 



A± = ^{Xa + Xb) ± J\{Xa - Ab)2 + 1 



(36) 



For stability, the product of the eigenvalues must be non-negative. A further subtlety 
is introduced here. A+ can be seen to be > irrespective of cr^ or a. A_, on the other 
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hand, changes sign, moving from negative to positive for smaller o"^. However, intuitively we 
expect, from the analogy of the noise with the 'temperature' parameter in other models of 
neural networks and physical systems that if replica-symmetry breaking is to set in, 
it will do so at low noise variances. This is confirmed by the eminently sensible behaviour of 
the mutual information curves of Fig. |1| at medium to high noise, but nonphysical behaviour 
at very low noise values. It can be concluded that, as occurs in [jllp^, a sign reversal has 
been introduced due to the integration contour, which must be corrected. 

These equations have been numerically solved for A_. Fig. ^ shows the behavior of A_ 
for a range of sparsenesses and thresholds. Where the eigenvalue passes above the zero axis 
(dotted line), a phase of RS- instability is indicated. Fig. ^a is for the situation of quite sparse 
coding of the patterns. As the noise is reduced from the high noise region, in which the RS 
solution is stable, the eigenvalue changes sign, and an unstable region is entered. In the case 
of threshold = 0.4, which represents only a very small degree of threshold-like behavior, 
the eigenvalue can be seen to curve back and change sign again at lower noise values still. 
Due to non- convergence of numerical integration, it is not possible to examine extremely 
small noise values; therefore it is not clear from this diagram whether the eigenvalue also 
falls below zero again for the other curves plotted in this figure, or if it instead has a finite 
value at zero noise. However, any region of RS stability at noise variances this low would 
obviously be irrelevant for the same numerical reasons. 

It is apparent from Figs. 0(b) and (c) that as the input distribution is made less sparse 
(a is increased), the critical amount of noise below which instability arises increases. This 
will be discussed again shortly. Another effect that can be seen in Figs. |(a) and (b) is that, 
as the neurons are made more linear (^o is increased), the critical noise first rises, then falls. 
This becomes more clear after plotting a phase diagram of noise against (Fig- 0)- For low 
a (sparse distributions), the critical noise rises, falls, and then curves back around on itself 
- after the neurons become sufficiently linear, there is no more region of instability. As the 
pattern code becomes less sparse, at first the region of instability merely expands. When a 
reaches a certain value, however, the edge of the unstable region no longer curls in on itself, 
but extends outwards. At a sparseness of 0.5, for instance, the critical noise thus first rises 
with increasing linearity, taking longer to reach its peak than for more sparse distributions, 
then falls, and finally levels off and decreases slowly. The sparseness at which this change in 
behavior is exhibited is independent of the parameters of the system, and can be seen from 
Fig. to lie somewhere between 0.2 and 0.5. 

In the special case of the linear limit, in which C,o ^ oo, Xa disappears (see Appendix), 
and stability is assured. For finite a^nd above the coefficient of sparseness referred to in 
the previous paragraph, though, there is a distinct and reasonably large region of instability. 

The resulting phase diagrams are shown in Fig. ^. Fig. ^a) shows the situation for 
,^0 = —0.4, which corresponds to threshold-linear behavior. As is increased (Fig. ^d- 
d; the neurons are made progressively "more linear"), the critical noise variance at which 
instability of the RS solution sets in first increases, and then decreases, as would be expected 
from Fig. ^. In Fig. ^(d), the line of critical noise variance abruptly stops at a ~ 0.23: at this 
point, the replicon-mode eigenvalue passes below the zero axis, and stability is assured. In 
all cases, it is apparent that in particular for very sparse distributions, the replica-symmetric 
equations are valid down to quite low noise. For less sparse coding, where the pattern entropy 
is significantly higher, the replica-symmetry-broken solution would seem to be relevant for 
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higher noise variances. 

It should be noted that the sparseness of the distribution of outputs is not the same as 
that of the inputs. This can be determined by 



(37) 



where 




(38) 



The hues of marginal stability for = —0.4, = 0.0, = 0.4 and = 0.80 are replotted 
in Fig. 1^ against the output sparseness. Although the phase diagrams look fairly similar 
when plotted as a function of input sparseness, they occupy different regions of the output 
sparseness domain because of the thresholding. It is also worth noting that because of the 
mapping performed by Eq. |3^, the boundaries of the regions in Fig. ^ do not necessarily 
form the boundaries of the regions in the output-sparseness plane, which in some instances 
constitute points from inside the above curves. 

For neurons operating in the threshold-linear regime (left curve, < 0.0), where output 
sparseness is effectively constrained by the thresholding, the stability characteristics are 
quahtatively as has been described earlier. For = 0.0, it is apparent from Eqs. and 
|38| that the output sparseness is constant (regardless of the input sparseness) at a value of 
I/tt. As ^0 is increased above zero, the output becomes less sparse, and the line of marginal 
stability is flipped horizontally (because in this range the entropy is higher for smaller 
ttout] right curves). Assuming that the sparseness of coding in connected sets of neurons in 
the brain tends to be similar, the former curve (for threshold-linear behaviour) might be 
considered the more biologically applicable, with the threshold in this model incorporating 
functionally the constraint on the degree of neural activity. 



This paper has detailed the replica symmetric solution for the information transmitted 
by a feedforward network of threshold-linear neurons, and examined its stability to fluc- 
tuations in the direction of replica symmetry breaking. It appears that for sparse pattern 
distributions, replica-symmetry breaking only sets in at noise variances sufficiently small 
that we might reasonably consider them to be 'beyond the realm of biological interest', at 
least for noisy cortical cells. We believe that, quite importantly, there is every reason to 
expect that these results carry over to the slightly more complicated 'Schaffer collateral' cal- 
culation described in [^,0. There is thus reason to feel confidence in the replica-symmetric 
assumption when analysing neural networks in areas such as the hippocampus which are 
known to code sparsely. 

When more distributed (less sparse) encoding is used, the mutual information solution 
is prone to instability to replica-symmetry-breaking fluctuations at higher amounts of noise 



V. CONCLUSIONS 
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than in the sparse case. It is not clear from the current analysis what the quantitative effect 
of broken replica symmetry might be, or what the form of the exact solution would be in 
that case (e.g. the Parisi ansatz [|1^). Care should therefore be taken when analysing the 
information conveyed by networks using more distributed encoding. 
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APPENDIX 



In this appendix the transverse eigenvalues of the submatrices A^"'^^^^'^^^ and 
are calculated. Both y4("^)'('>''^) and 5("^)'('>''^) have three different types of matrix elements 
depending on whether none, one or two replica indices of the pair {af3) equal those of the 
pair (75). The three possible values A^°''^^'^"'^^ can take are: 
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ajCr (g2 + 2pqY 
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x(0 



{k,t) 
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{k, t) is defined as 
d^ 



AW p*{p + qY 
dt 



^271 

di 



6) exp 



2{p + q) 



2{p + q) 



^o)''exp 



2{p + q) 



(39) 



/27r 



x{0 exp 



k 



kt'^' 

(e-eo)-^ 



(40) 



which can be considered to be a weighted average of x(^) over the subthreshold values of ^. 
k is used to normalise the weight factor over the t integral in each of Eqs. p9l Also, 



W 



^0 



+ 



dt 
, — ( 
72^ 



exp 



t^{2p + q) 
A{p + q) 



(41) 



and p,q are here ps and from Eq. |2^. 
We have to solve the eigenvalue equation 

Alp = Xip. 

The eigenvectors ip have the column-vector form 

^ = (^{6z"^} ) {a<f3 



n + 1) 



(42) 



(43) 



We now proceed as described in [0. There are three classes of eigenvectors (and cor- 
responding eigenvalues) - those invariant under interchange of all indices, those invariant 
under interchange of all but one index, and those invariant under interchange of all but two 
indices. These last describe the transverse mode, in which we are interested. 
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Let us consider fluctuations of the form 

^^a/3_^a/3 (a</?=l,..,n+l) 



(44) 



with 



2 — n 



(45) 



ensuring orthogonahty between the eigenvectors describing RS and RSB fluctuations. As 
with W^, we have for A^°''^'^'^"'^'> an eigenvalue 



Xa = P-2Q + R 

with in this case |(n + 1)(?t, — 2)-fold degeneracy, and P, Q and R as described above. 
For B^°'^^'^"'^\ we consider fluctuations 



(46) 



55"^ = cA 



a/3 



{a < P = 1, ..,n + 1) 



and obtain similarly the eigenvalue 

Xb = P'- 2Q' + R', 



where 



P' 

Q' 
R' 



oo 

-co 
oo 

oo 



Dt 
Dt 
Dt 



V 

(|,t)U'(|,t) 



and 



x{r]) 



{k,t), the weighted pattern average, is deflned as 



x{ri) 



{k,t) = / dr]P{r])x{7]) exY> 



(47) 



(48) 



(49) 



(50) 
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FIGURES 
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gaussian channel 




0.2 0.4 ^ ^0.6 0.8 1 

b < ) 

FIG. 1. Mutual information, measured in bits, as a function of noise variance. The dashed 
line is for a threshold = —0.4, whereas the solid line is for the limit of linear neurons. The 
dot-dashed line indicates the simple gaussian channel for comparison. The entropy of the input 
pattern distribution is indicated by the horizontal dotted line, (a) Input pattern distribution 
sparseness of 0.05. (b) Sparseness of 0.50. 
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FIG. 2. The behavior of the rephcon mode eigenvalue A_ as a function of noise variance, (a) 
Input sparseness a = 0.05 (b) a = 0.10 (c) a = 0.50. In each of these graphs the soUd Une indicates 
the eigenvalue for threshold = —0.4, the dashed curve = 0.0, the dot-dashed curve = 0.4, 
and the dotted curve = 0.8. The replica symmetric solution is unstable in regions where these 
curves lie above the horizontal dotted line. In case (a), the = 0.8 line lies below the region 
examined in the graph. 
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FIG. 3. A phase diagram showing the critical noise variance as a function of the threshold 
parameter, ~ the larger is, the more linear the regime. Solid curve, sparseness a = 0.05; 
dashed curve, a = 0.10; dot-dashed curve, a = 0.20; dotted curve, a = 0.50. 



17 



0.16r 



0.16 




a 



0.2 



0.4 0.6 
a 



0.8 





0.2 



0.4 



0.6 



0.8 



d 



0.16 
0.14 
0.12 
0.1 
0.08 
0.06 
0.04 
0.02 
0. 



stable 



unstable 



0.2 



0.4 



0.6 



0.8 



FIG. 4. The phase diagram for information transmission, for r = 2 and a'j = 1/C. (a) Thresh- 
old ^0 = -0.4. (b) Threshold = +0.0. (c) Threshold = +0.4. (d) Threshold = +0.8. 
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FIG. 5. The marginal noise variance as a function of the sparseness of the output distribution. 
The sohd hne represents the curve for = —0.40 (the same situation as Fig. the dashed 
curve ^0 = 0.0, the dot-dashed curve = +0.40, and the dotted curve = +0.80. Note that for 
^0 = 0.0 the output sparseness is fixed at l/vr, as explained in the text, so this particular line is 
not informative about the relative region of instability. 
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